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We performed statistical analysis on data from the Digg.com website, which enables its 
users to express their opinion on news stories by taking part in forum-like discussions as 
well as directly evaluate previous posts and stories by assigning so called " diggs" . Owing 
to fact that the content of each post has been annotated with its emotional value, apart 
from the strictly structural properties, the study also includes an analysis of the average 
emotional response of the posts commenting the main story. While analysing correlations 
at the story level, an interesting relationship between the number of diggs and the number 
of comments received by a story was found. The correlation between the two quantities 
is high for data where small threads dominate and consistently decreases for longer 
threads. However, while the correlation of the number of diggs and the average emotional 
response tends to grow for longer threads, correlations between numbers of comments 
and the average emotional response are almost zero. We also show that the initial set of 
comments given to a story has a substantial impact on the further "life" of the discussion: 
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high negative average emotions in the first 10 comments lead to longer threads while the 
opposite situation results in shorter discussions. We also suggest presence of two different 
mechanisms governing the evolution of the discussion and, consequently, its length. 

Keywords: correlations; collective phenomena; sociophysics 

1. Introduction 

Although the concept of physical modelling of social processes is older than the idea 
of statistical modelling of physical phenomena (dating back to the times of Laplace, 
Comte or Stuart Mill) [T], it has not been until recent years that the methods and 
models of physics became widely (and successfully) employed to the description of 
social phenomena. Thanks to its very general conceptual framework, the field of 
statistical physics has proven a tool of exceptional use in this regard [5] . 

The lively increase observed in the last fifteen years in the number of papers and 
works concerning the field in question was due to many factors. Firstly, with the 
advances of computational powers and storage capabilities large digital databases 
have become available to scientists. Secondly, owing to the rapid development of 
the Internet unprecedented social processes appeared and prompted more and more 
physicists to turn their attention to the rising domain of sociophysics. 

Exploring the behaviour of complex systems comprising humans rather than 
particles, physicists have tackled so far a number of phenomena: from spontaneous 
formation of a common culture and its dissemination through evolution of opinions 
and crowd behaviour to language dynamics [2j [3l [4] . Very recently studying and 
modelling of collective emotions in Internet communities also gained considerable 
interest. 

Collective emotions are relatively straightforward to notice in online communi- 
ties. Highly emotional discussions are usually connected with very exciting, con- 
troversial or tragic events. When many users take part in such highly emotional 
communication we call this phenomenon collective emotion. The studies of collec- 
tive emotions in online communities comprises two major areas. First is sentiment 
analysis - computational methods to extract emotional content of a written text 
and to classify this content according to a set of possible dimensions. Second - 
building mathematical models of the emergence of collective emotions based on the 
psychological and sociological body of knowledge on emotion. Computational sim- 
ulations of the models and the comparison of their results to empirical data verifies 
the validity of these models. 

During the last two years several papers were published presenting studies about 
the presence of collective emotions in the Internet - both empirical [51 [71 [HI HI EH 
[TT1[T2] as well as theoretical [T31Q311IFJ let alone the ones that stress the applicability 
of the results [THl H7J [TH] . The objective of this very paper was to analyse available 
Digg.com dataset trying to find regularities and relations that would give insight 
into the dependencies between the emotional content of online discussions and the 
opinions issued by its users. 
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2. Data and methods 

Internet discussion participants do not transmit their emotions directly, they com- 
municate via text messages which, depending on their emotional content, may in- 
duce certain emotional responses in readers. Information on emotions of individuals 
can usually be inferred from the physiological signals sent by their body. However, 
in online communities this is not the case; we are left only with textual statements 
from discussion participants. The question arises how to infer from these statements 
information regarding emotions. 

Before trying to detect emotions, one needs to decide on how to measure them. 
A well-established psychological theory of emotion, the circumplex model, is com- 
monly used in sociophysics modelling. It takes into account 2 dimensions of emotion: 
valence and arousal. The former indicates how positive or negative the emotion is, 
the latter - the level of personal activity caused by that emotion (from lethargic 
to hyperactive) 6 . Depending on different methods sentiment analysis can usually 
provide valence or a specific combination of the two values |19j . Different proce- 
dures of emotion detection are in use. The analysis in [19] and [20] engaged manual 
(human) annotation, but this method allows only for a limited number of textual 
statements to be assessed. Much larger amount of data can be processed with the use 
of automatic annotation developed within the already mentioned field of sentiment 
analysis. 

Application of sentiment analysis (also known as opinion mining) to the de- 
tection and classification of emotions is a development of the field which initially 
concentrated on extracting opinions |19j . In the last ten years this area of research 
has seen a substantial growth [21] gaining a lot of attention from industry and 
academia alike. This was due to the phenomena of Web 2.0 leading to an unprece- 
dented increase in the amount of online content generated by regular users, rather 
than website owners or publishers. The information contained in user-generated 
content (UGC) could be of pivotal importance to firms and institutions. Hence, 
first efforts of sentiment analysis focused on analysing multiple movie reviews or 
comments regarding manufacturer's products with the purpose to determine which 
features receive most positive and negative feedback. The two fundamental tasks of 
sentiment analysis are: (i) identifying whether a text is objective or subjective (i.e. 
contains facts or opinions/emotions) (ii) determining its subjective polarity (i.e. 
identifying how positive/negative it is). 

In case of this study we used the following approach: during the training phase, 
the program is fed with a set of documents classified by humans for emotional 
content (positive, negative or objective) from which it learns the characteristics 
of each type. Afterwards, during the second phase, the algorithm applies obtained 
sentiment classification knowledge to previously unseen documents. We trained a 
hierarchical Language Model [351 13 on the Blogs06 collection [53] and applied the 
trained model to the Digg data. Each post is initially classified as objective or 
subjective and in the latter case, it is further classified in terms of its polarity, i.e., 
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positive or negative. Each level of classification applies a binary Language Model [24j 
[7]- Eventually posts are annotated with a single value e = —1, or 1 to indicate their 
valence (negative, neutral or positive, respectively). The dataset was obtained by 
a complete crawl of the Digg site [25] for months February, March and April 2009. 
Data concerning stories submitted during that period was collected. Information 
on users, diggs and comments relevant to the stories was also gathered. The most 
fundamental properties of the dataset are shown in Table [U 



Tabic 1. Fundamental statistics of the dataset 



Stories/posts 


Comments 


Users 


Stories per user Comments per user 


1195808 


1646153 


484985 


2.47 3.39 



An example of an online user-generated content (USG) rating network, Digg.com 
relies on users to submit and moderate news stories. Each newly-submitted story 
goes to the Upcoming section, which is the place where users browse and vote for (or 
using website's nomenclature: digg) the stories they like most. Once the story fulfils 
special criteria it gets promoted and is moved to the Popular section displayed as 
website's front page. The exact promotion algorithm is not known to the public (and 
changes on regular basis) , but the number of votes (diggs) and the rate at which a 
story receives them are the most important factors [26l [23 [28J [29l [30] . The order 
in which promoted stories are featured at the front page is also a subject to the 
algorithm. The most interesting and relevant stories (according to the promotion 
mechanism) are placed at the top of the page. 

Apart from receiving diggs, each story can be commented. Comments, in turn, 
can obtain diggs (approvals), but in addition they can also be disapproved of by 
getting thumbs down/ diggs down (the activity called buring). In the context of 
analysing the emotional content of websites, Digg's voting system seems of pivotal 
importance. In addition to emotional classification carried out by Wolverhampton 
partners, the dataset allows for analyses of the number of diggs up/ diggs down which 
to some extent reflects how interesting and/or emotionally engaging the assessed 
story/comment was. A schematic plot of the Digg structure is shown in Fig. [TJ 

3. Structural properties 
3.1. Threads' lifetime 

We define thread's lifetime as the time that elapsed between the first and the last 
comment in a thread. Using comments' timestamps (exact date and time) threads' 
lifetimes were calculated. Their histograms (using different ranges and time scales) 
are plotted in Fig. They reveal clear increase of threads counts for lifetimes of 
24 hours and mulitples of this period (inset in Fig. Wfr)- Also, using different time 
scale, the character of the graph seems to change around the value of 30 days (Fig. 
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DIGG STRUCTURE 
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Fig. 1. Digg structure as present in the gathered data. Each story starts with a post (an empty 
ellipse) that can obtain diggs (a box inside the ellipse). The post can be commented as well as 
comments can, however no deeper than to the second level. The comments (rectangular boxes) 
themselves also obtain diggs but contrary to the post they can be both positive (diggs up) or 
negative (diggs down). Each comment is subject to emotional classification, shown with different 
colours of the comment box. 



®>). 

Most certainly this behaviour is due (at least in part) to the promotion mech- 
anism of the site and the presentation of top ranked material. Next to thematic 
categories, Digg.com allows for viewing the most popular stories within a specific 
period of time. At the front page an Internet user may choose to browse through 
most recent stories, top in 24 hours, 7 days, 30 days or 365 days. 



3.2. Comments distributions 

Histogram of the number of comments for all data (Fig. |3K) show two distinct dis- 
tributions: for lower values a power law can be observed, and then starting around 
20th comment a significantly different distribution takes over. One can hypothe- 
sise that the two distributions might be generated by two distinct classes of users 
(e.g. regular ones and spammers or advertising/marketing professionals). In order 
to verify that statement graphs presenting users' productivity were plotted (Fig. 
0>)- Productivity was measured in the number of commented stories (stories that 
received at least 1 comment) and it follows a single power law relation indicating 
meaningful presence of only one class of users. Otherwise we would see multiply 
power law (or other) distributions, one for each group. 
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Fig. 2. (a) Log-log plot of the histogram of threads' lifetime given in hours. The inset shows the 
same data in a semi-logarithmic scale, (b) Log-log plot of the histogram of threads' lifetime given 
in days. Both hours and days are rounded up to integer values. 
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Fig. 3. (a) Log-log histogram of the number of comments for all data, (b) Log-log histogram of the 
users' productivity i.e. the number of users that posted a certain amount of stories in Digg.com. 
Only stories that acquired at least one comment were taken into account, (c) Log-log plot of 
number of stories that obtained a certain amount of diggs. 

3.3. Diggs distributions 

Similarly to comments, diggs histogram - presented in Fig. |3t displays two distinct 
distribution. Plots first start with power law and then evolve into Gaussian peak. 
Power law relation may be explained by a preferential process |31j - a phenomenon 
quite common in complex systems, responsible for fat-tailed distributions including 
power-laws |32| I33j . Obtained histograms represent real data and they are not an 
outcome of a simulation, however the same mechanism of preferential attachment 
was at work here. The more diggs the main post (story) obtained in a specific 
period of time the better promoted it was through the Digg.com algorithm (front- 
page placement, higher ranking position etc). 



4. Average emotional response 

Different posted materials trigger different emotional responses. In order to measure 
overall reaction we introduce a new quantity. We define average emotional response 
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Tabic 2. Number of users engaged in different activities. 

activity number of users 

gave at least 1 digg 460584 

posted at least 1 comment 137183 

submitted at least 1 story 218686 

posted at least 2 comments 77804 

submitted at least 2 stories 108068 

submitted at least 1 story, each of the stories had at least 1 comment 44962 

submitted at least 1 story, each of the stories had at least 4 comment 7128 

submitted at least 1 story, each of the stories had at least 8 comment 3515 

submitted at least 1 story, each of the stories had at least 15 comment 2224 



to a main post (story) as a mean value of the emotional content of the comments 
(all levels) submitted to this story 

1 - 

8=1 



where e; E {—1,0, 1} is the emotional content of the i-th comment and N is the 
number of comments (all levels) submitted to a given story. 

To determine responses to materials submitted by individual users (or the ones 
published at individual websites), we group together threads started by the same 
user (or originating from the same website) and calculate averages of (e) in those 
groups. In the following three subsections average response to individual posts 
(that is (e) thread in each thread) as well as to individual users {e) user and web- 
sites (e) we bsite will be presented. 

4.1. Threads 

Analysis concerned only commented stories, i.e. those which initiated a thread. 
There were 129 998 such main posts (stories). As can be seen from Fig. due to 
a large number of very short threads in the whole data there is a prevalence of (e) 
having exact (it employs 1000 bins) value of either -1,0 or 1. Similarly, smaller peaks 
correspond to values of ±i,±i,±|,±|, etc. which are the only possible results for 
short (and quite numerous) threads. When we take into account a subset comprising 
threads of the length of 8 or more comments (Fig. 0t>), we eliminate peaks for the 
three values and get a distribution resembling normal distribution shifted slightly 
to negative values. 

4.2. Users 

Collected data enabled to track material submitted by specific users and to measure 
average overall emotional response to their posts. During the crawl there were 484 
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Fig. 4. Histograms of the average emotional response, (a) Average emotional response of threads 
{e)thread, number of bins N i„ a = 50, all data, (b) Average emotional response of threads 
{e)thread> number of bins Nf,i ns = 20, threads with a threshold N commen t s > 8. (c) Average 
emotional response to material submitted by specific users {e) UBer , number of bins iV;^ ns = 20, 
threads with a threshold N commen t s > 8. (d) Average emotional response to a content from 
specific websites {e) w& bsite> number of bins A^ins = 20, threads with a threshold Ncomments > 8. 



985 registered and active users who either: posted a story, wrote a comment or 
dug some material. Table [2] shows the number of users who engaged in different 
activities. 

It can be seen from the table that more users were interested in submitting a 
story than posting a comment. This is probably due to fact that the primary appeal 
of Digg.com to vast majority (so-called 'light users') is to share content of interest 
with others, the desire to exchange comments at the website being of slightly lower 
importance. 

In order to cut off peaks for -1,0 and 1, thresholds in the number of comments 
for a given story were used. The number of stories posted by a user did not seem 
a good threshold as there were users who posted many stories of which few were 
commented. For example: user with ID 59919, who posted links to a web site on golf 
in the UK submitted 1402 stories (the greatest number of all users) of which only 
5 were commented with the total of 6 comments, i.e. 4 stories received 1 comment 
and 1 story - 2 comments. He posted none own comments. 

Figure 2J; shows that histograms for users are similar in character: without a 
threshold in the number of comments there are 3 distinct peaks for -1,0 and 1. 
Once the threshold is introduced, we can distinguish two peaks for values near 
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and -0.5. 



4.3. Websites 

As was the 

case with individual users, collected data enabled also to measure emotional re- 
sponse to content from specific websites (e.g. www.youtube.com, www.nytimes.com, 
news.bbc.co.uk). Top 50 most popular websites (in terms of the number of times 
their content appeared at Digg. com) are presented in Table El The outright win- 
ner is video-sharing website www.youtube.com, followed by www.examiner.com (cit- 
izen journalism) and a number of professional online newspapers/news outlets 
(www.nytimes.com, news.bbc.co.uk, www.cnn.com, www.telegraph.co.uk) and news 
aggregators (www.hujfingtonpost.com, news.yahoo.com). It is worth noting that 
thanks to taking into account only commented posts we eliminate the impact of 
material submitted for advertising or marketing purposes. 

Out of the top 50 most popular websites those with the lowest and highest 
value of average emotional response were listed in Tableland Table [5] respectively. 
As can be seen, websites dealing with politics generate most negative emotions 
(blogs and news websites with political bias, website of the British National Party) 
while those concentrated on gadgets and technology/giving advice/with humorous 
content receive most positive responses. Similar conclusions were arrived at in a 
paper by P. Sobkowicz and A. Sobkowicz [3D]. The authors analysed data from 
the Politics section of discussion fora at one of the most popular Internet portals 
in Poland - www.gazeta.pl. Using human assessment of comments for a sample of 
discussion threads they noted that aggressive comments (disagreeing, provocative 
or invectives) accounted for 75% of communication between Politics forum users. 
This was not the case in Sports or Science forum. Moods and opinions of Sports 
section participants turned out to be usually similar while discussions on science, if 
happened to be longer, tended to be more factual in character. 




Fig. 5. Comparison of average emotional histograms, (a) threads and users, (b) threads and web- 
sites. Number of bins Nf, ins = 20, all threads with a threshold N commen ts > 8. 
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Table 3. Top 50 most popular websites (in terms of the number of ap- 
pearance). 



Rank Website Counts Comments {e) W ebsite 



1 


www.youtube.com 


6808 


76433 


0.067 


2 


www . examiner . com 


2512 


14280 


-0.131 


3 


www.nytimes.com 


1465 


34719 


-0.280 


4 


www.huffingtonpost.com 


1451 


35478 


-0.338 


5 


news.bbc.co.uk 


1222 


15092 


-0.301 


6 


news.yahoo.com 


1066 


21225 


-0.394 


7 


www.cnn.com 


965 


17773 


-0.373 


8 


www . telegraph .co.uk 


884 


32447 


-0.232 


9 


www .reuters.com 


838 


12928 


-0.342 


10 


rawstory.com 


729 


14136 


-0.563 


11 


www.washingtonpost.com 


688 


12435 


-0.410 


12 


www.flickr.com 


666 


20378 


0.091 


13 


www . f oxne ws .com 


661 


6675 


-0.451 


14 


arstcchnica. com 


658 


16697 


0.064 


15 


www.squidoo.com 


632 


825 


0.310 


16 


onlinc.wsj.com 


558 


13298 


-0.300 


17 


i. gizmodo.com 


549 


15481 


0.132 


18 


www.timc.com 


532 


22557 


-0.243 


19 


www.dailymail.co.uk 


505 


16299 


-0.245 


20 


www.alternet.org 


497 


6446 


-0.389 


21 


www.msnbc.msn.com 


479 


16409 


-0.239 


22 


www .guardian, co. uk 


475 


12341 


-0.311 


23 


news.cnet.com 


468 


13633 


0.064 


24 


www.worldnctdaily.com 


459 


4214 


-0.483 


25 


blog.wired.com 


415 


11508 


0.009 


26 


www.chicagotribune.com 


402 


9077 


-0.121 


27 


www.latimes.com 


379 


8885 


-0.313 


28 


www . collegehumor .com 


371 


9034 


0.138 


29 


www .opednews.com 


348 


644 


-0.249 


30 


www . breit bart . com 


338 


6899 


-0.421 


31 


www.politico.com 


325 


8571 


-0.499 


32 


www .news. com.au 


317 


10085 


-0.127 


33 


bnp.org.uk 


314 


1753 


-0.556 


34 


hubpages.com 


301 


439 


0.189 


35 


www.break.com 


295 


4068 


0.026 


41 


www . engadget . com 


257 


10362 


0.186 


42 


www.usatoday.com 


247 


4658 


-0.262 


43 


www . daily kos .com 


243 


4043 


-0.384 


44 


www.slate.com 


240 


3006 


-0.157 


45 


www.google.com 


240 


2920 


-0.246 


46 


www.ebaumsworld.com 


239 


578 


-0.003 


47 


www.salon.com 


237 


6367 


-0.459 


48 


abcnews.go.com 


234 


6170 


-0.257 


49 


blog.propcrtynice.com 


226 


248 


0.338 


50 


www.theonion.com 


221 


4043 


0.118 



In addition, a graph presenting average number of discussions (threads) initiated 
by websites grouped according to the value of average emotional response to mate- 
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Table 4. Top 15 websites (out of top 50 most popular) with the lowest {e) we b a ite . 



Rank 


Website 


Counts rank 


website 


Description 


i 


rawstory.com 


1 n 
1U 


-U.OOo 


liberal news, politics and blogs 


2 


bnp.org.uk 


33 


-0.556 


British National Party 





www.politico.com 


Q "1 

6X 


n /inn 


American political journalism, conser- 










vative and Republican bias 


4 


www . wor ldnetdaily. com 


24 


-0.483 


news and associated content, Ameri- 










can conservative perspective 


5 


www.salon.com 


47 


-0.459 


online magazine, focuses on U.S. poli- 










tics, criticized for its left-leaning con- 










tent 


6 


www . foxnews . com 


13 


-0.451 


news channel persived as promoting 










conservative political positions 


7 


www. breitbart . com 


30 


-0.421 


news site, its Blog & "Network" links 










tend to run to the right within the U.S. 










political spectrum 


8 


www.washingtonpost.com 


11 


-0.41 


The Washington Post 


9 


news. yahoo. com 


6 


-0.394 


news site provided by Yahoo! 


10 


www.alternet.org 


20 


-0.389 


progressive/liberal activist news ser- 
vice 


11 


www . daily kos . com 


43 


-0.384 


American political blog, liberal or pro- 










gressive point of view 


12 


www.npr.org 


37 


-0.373 


news site of the National Public Radio, 










a media organization that serves as a 










national syndicator to most public ra- 










dio stations in the United States 


13 


www .cnn.com 


7 


-0.373 


CNN 


14 


www.reuters.com 


9 


-0.342 


Reuters 


15 


www .huffingtonpost.com 


4 


-0.338 


liberal/ progressive American news 










website and aggregated blog 




Fig. 6. Proportion of the average emotional response histogram of users to the histogram of threads 
(circles) and websites to threads (squares), (a) All data, Ni,i n3 = 20. (b) Threads with a threshold 

^comments — 8, ^bins — 20. 



rials coming from these websites was plotted (squares in Fig. [7]) . It shows that most 
successful in terms of the number of appearance and discussions' initiation were 
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Table 5. Top 15 websites (out of top 50 most popular) with the highest (e) website- 



Rank 


Website 


Counts rank 


website 


Description/category 


1 


blog.propertynice.com 


49 


0.338 


properties 


2 


www . squidoo . com 


15 


0.310 


publishing platform for posting 










overview material on topic of interest, 










e.g. "50 Things you can Reuse " 


3 


hubpages.com 


34 


0.189 


publishing tool, all sorts of topics 


■1 


www . engadget .com 


41 


0.186 


gadgets, technology 


5 


www. collegehumor . com 


28 


0.138 


humour 


6 


i. gizmodo.com 


17 


0.132 


gadgets, technology 
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www.theonion.com 
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web site of a parody newspaper 
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image hosting and video hosting web- 
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website of a computer magazine 
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American comedy website 
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video sharing website 
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news and reviews, analysis of technol- 
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news.cnet.com 
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top technology news headlines 
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www.break.com 


35 
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humor website, formerly Big-boys.com 
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blog.wired.com 


25 


0.009 


aggregated blogs ofWired.com - an on- 










line technology news website 



websites that received (e) we b S ite of slightly and mildly negative values. Similarly 
to distributions for threads and users, (e} we b S ite histograms without a threshold 
show 3 large peaks for values -1,0 and 1, though their proportional heights differ 
between the groups of threads, users and websites. After introducing a threshold a 
Gaussian-like distribution appears, lightly shifted to the left. 

4.4. Background removed 

In order to make more visible the differences between distributions for threads, users 
and websites, a procedure of background removal was carried out. By background 
the distribution of threads is meant here. We divide counts from (e) histogram for 
users/website (dark-colored bars in Fig. [5K-b) by corresponding counts from (e) 
histogram for threads (dashed bars). The proportion is plotted for data with (Fig. 
[6^) and without a threshold in the number of comments (Fig. [BJd) . 

For data without any cut-off we get similar plots both for websites and users 
(squares and circles in Fig. |6] respectively). However, there is not much point in 
inspecting them in much detail on their own, because as we have already seen 
numerous short threads tend to blur the picture. More informative are plots for data 
with a cut-off. And we can see that when a threshold is introduced {N comments > 8), 
the character of plot for websites does not change (squares in Fig. [BJd) , while plot 
for users transforms into one resembling U-shape (circles in Fig. EJd). 

Based on the figures a number of hypotheses could be formulated. However, in 
order to avoid false conclusions, additional graphs have been plotted. They shed 



i 



i 



April 17, 2012 1:49 WSPC/INSTRUCTION FILE acs'digg 



Statistical Analysis of Emotions and Opinions at Digg Website 13 

some more light on the matter under investigation. Namely, Fig. [7] presents average 
number of discussions (threads) initiated by websites (squares) and users (circles) 
grouped according to the value of average emotional response to materials coming 
from these websites/submitted by these users. 

Now, let us turn back to the analysis of the results obtained in Fig.|6] Most inter- 
esting is the difference in graphs for websites and users, with introduced threshold. 
It is shown that the group of websites receiving exclusively very positive responses 
and groups of user receiving exclusively either very positive or very negative re- 
sponses are relatively more numerous. On the other hand, the group of websites 
receiving exclusively very negative responses is relatively less numerous. 

We could hypothesize that largely negative threads group into a relatively small 
number of websites which in turn originate many discussions. However, this logics 
fails when confronted with Fig. [7] Here for values between -1 and -0.7 the average 
number of threads is equal or very close to 1 meaning that almost all websites which 
receive (e) of values from this region initiated only one thread. The same is true for 
the region of positive (e) (from 0.5 upwards) and for users (negative and positive 
regions). Also, in the case of users, those whose submitted material received (e) of 
values between -0.4 and 0.5 are relatively slightly less numerous. 

It is worth to emphasise that observed relations are not a special case of the 
threshold used (N comments > 8). Starting with threads of the length of 5 or more 
comments, the U-shapc for users clearly appears and becomes more and more ev- 
ident with the increase of the threshold. Plots for websites also express the same 
behaviour irrespective of the threshold used. 
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Fig. 7. Average number of discussions (threads) initiated by websites (squares) and users (circles) 
that have received a specific average emotional response. All threads with a threshold N commen ts > 
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5. Correlations 

The objective of the following part of analysis was to determine whether emotional 
content obtained by computer program (both continuous and binary classifications 
used) correlates with the number of users' approvals (diggs up), disapprovals (diggs 
down) or some function thereof. Data was divided into a number of subsets for 
which correlation coefficients were calculated. In addition, some graphs presenting 
behaviour for selected ranges were plotted. 



5.1. Correlations of diggs and emotional response 



b 0,4 F 




100 200 300 

threshold point: number of comments 
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threshold point: number of comments 
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Fig. 8. (a) Correlation coefficients between number of diggs Njiggg and the average emotional 
response (e) thread versus the threshold on the number of comments N commen ts ■ Squares — corre- 
lation coefficients for all threads, triangles — correlation coefficient for threads with {e) t f lreaa > 0, 
circles — correlation coefficient for threads with {e}thread < 0- (b) Correlation coefficients between 
number of comments N cornrnen ts and the average emotional response (e) t h rea( ; versus the thresh- 
old on the number of comments N cornrnen t s - Squares — correlation coefficients for all threads, 
(triangles) correlation coefficient for threads with (e^read > 0, circles — correlation coefficient 
for threads with (e) t f lrt , a< i < 0. 



For main posts (stories) only the number of approvals (diggs up) is registered 
by Digg.com mechanism, hence it was the only metric that could be used for deter- 
mining correlation with average emotional response (e) thread- Correlation coefficient 
between number of diggs Ndi ggs and (e) thread for all commented stories (threads) 
equals to -0.027, implying no correlation on a global level. In order to establish 
whether or not any correlation can be observed for limited ranges of data, correla- 
tion coefficient were calculated for various data subsets obtained by introduction of 
different thresholds in the number of comments ('high-pass' mechanism, i.e. threads 
with a specific and higher number of comments were taken into account). In Fig. 
[5^ obtained coefficients were plotted versus threshold values. For example, thresh- 
old point equal to 200 means that the coefficient was calculated for the subset of 
threads of the length equal to 200 comments and more. 
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As Fig. [8ji reveals, the correlation level between (e)thread and Ndi ggs (squares) 
is positive and, except for initial bump, increases with the length of the threads 
(linearly for a long range of values). This behaviour is accounted for mainly by 
threads with negative {e) thread (circles in Fig. [HK) owing to their prevailing number. 
Unintuitive is the fact that correlation for threads with positive (e) thread is negative 
for all thresholds (triangles in Fig.[5K). Initially it sharply increases and from around 
the point equal to 25 stays at more less the same level. One would expect that more 
popular stories (i.e. those with larger number of diggs) will trigger more positive 
responses. A contrary behaviour is implied by data - popular stories tend to have 
lower (though, still positive) (e)thread- This is most significant in the region where 
small threads dominate - the most negative value of correlation coefficient can be 
observed there. 

Similarly, Fig. [8]o also implies that it is the stories with the value of {e)thread 
nearing zero (either positive or negative) that attract larger numbers of diggs. Pos- 
itive correlation for negative (e) thread means that the higher (closer to zero) the 
negative values, the more diggs they obtain. In addition, correlation for two "low- 
pass" thresholds were also calculated and listed in Table [5] proving explicitly that 
it is the impact of shorter threads that hides in the overall data the correlations for 
longer threads. 

Table 6. Correlation coefficients c between average 
emotional response (e) t ^ rea£ j and the number of diggs 
Ndiggs versus the number of comments with an im- 
posed threshold N cornrnen t s < x\ M stands for num- 
ber of threads. 



X 


c 


M 


100 


-0.017 


125399 


150 


-0.019 


127038 



5.2. Correlations of comments and average emotional response 

Similar calculations to those for diggs were carried out for correlation between 
(e)thread and the number of comments N comments (Fig.[Hh>). Correlation coefficient 
between the average emotional response and the number of comments is slightly 
negative with a minimum of 0.2 for the threshold point equal to around 50. The cut 
through around point 500 should be treated with care as fluctuations are more 
probable around this region due to lower statistics. The plot implies that longer 
threads tend to be (slightly) more negative. In order to further investigate the 
relation between (e) thread and N comments , coefficients for threads with exclusively 
negative and positive (e)thread were calculated and plotted in Fig. [5}x 

Relatively high (in the case of (e) t hread < 0) and low (in the case of (e) t hread > 0) 
initial values stem from the fact that in the region where short (but solely nega- 
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tive/positive) comments dominate their large (in absolute values) (e) thread have a 
huge impact, resulting in, respectively, very high (the longer the negative thread the 
less negative it gets) and very low (the longer the positive thread the less positive it 
gets) values of coefficients. As was the case with diggs, surprisingly the correlation 
for (e)thread > is (slightly) negative for larger values of comment thresholds as 
well. For (e)thread < an expected behaviour for the threshold range between 10 and 
300 occurs - the negative value of correlation indicates that (for this range) longer 
threads indeed tend to be more negative. However, for exclusively long threads this 
tendency is not the case. 

In case of comments, the data consisted also the information about the number 
of disapprovals (diggs down, negative diggs) submitted by users. This fact allows to 
consider another quantity - digg difference, defined as 

Ad = d U p ddowm (2) 

where d up (ddown) i s > respectively, the number of diggs up (down) submitted to the 
comment. The histogram H(Ad) is shown in Fig. |9l suggesting similar law behind 
the process of issuing both positive and negative diggs. However, as the largest |Ad| 
for the positive branch is about 10 times the value for the negative one and taking 
into account that H(\Ad\) for Ad < drops down much faster than for Ad > it 
seems that the users are much more reluctant to submit negative diggs. 




Fig. 9. Log-log histogram of the absolute value of digg difference H(\Ad\). Triangles represent the 
branch Ad < while circles Ad > 0. The solid lines represent power-law fitting to the data: in case 
of positive Ad the line follows H(\Ad\) ~ \Ad\~ 2 ' 2 , as for negative Ad it is H(\Ad\) ~ lAd] -3 - 1 . 

On this level of analysis it is also possible to check the relation between the 
average emotional value of comments (e c ) and the digg difference Ad. All comments 
that acquired a specific value of digg difference Ad were grouped together and their 
average emotional value was calculated (see grey points in Fig. [TUh). Figure [TUh 
suggests that sentiment of the comments characterised with a negative value of Ad 
tends to be more negative than in case of the comments with Ad > 0. Moreover, 
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the value of (e c ) saturates for higher Ad, being close to average emotional value of 
the whole data set ((e c )Digg = —0.16, marked with a dotted line in Fig. [TOk) . 

One has to take into account the fact, that the values shown in Fig. [TUk are 
not normalized. In order to present a more accurate quantity one should divide 
Ac? by Eg? = d up + ddowm thus obtaining the relative value of digg difference. 
The results, marked with squares, are plotted in Fig. 110b . demonstrating a clear 
minimum around ^ = ^ ano - increasing values for both positive and negative 
■^j once again stopping in the vicinity of {e)£n gg = —0.16. It leads to a rather 
stunning conclusion that no matter if all diggs submitted to a comment are positive 
or negative, its content, on average, will have a similar emotional value. A possible 
explanation might be put in the following way: a comment with the emotional 
content close to average value does not provoke a separation of opinions - it is either 
commonly liked or disliked. On the other hand, those comments that seem to divide 
the users into almost equal fractions seem to have very low (e c ). Bearing in mind 
that the above conclusion might be an artifact we checked the relation (e c ) 
for four different user communities obtained via eigenvalue spectral analysis of the 
weighted bipartite (i.e., users and comments) network of the most popular comments 
(for method details see [5J HI [5]). The results, shown for two largest communities 
with number of comments N c = 10214 (circles in 110b) and N c = 51166 (triangles 
in llOb ) confirm the previous observations. Although there are small discrepancies, 
the tendency stays the same, suggesting that the described behaviour is common 
regardless of the data set partition scheme. 




Fig. 10. (a) Average emotional value of comments (e c ) and the digg difference Ad. Grey points are 
real data while circles are obtained using a 5-point binning. Dotted line marks average emotional 
value of the whole data set (e)£, igg = —0.16 and solid lines indicate levels (e) = 1 and (e) = 
— 1. (b) Average emotional value of comments (e c ) versus normalised digg difference for all 
data (squares) and selected user communities (circles and triangles). Dotted line marks average 
emotional value of the whole data set {e)oi gg = —0.16. 
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5.3. Correlation of comments and diggs 

The correlation between the number of diggs Ndi ggs and the number of comments 
Ncomments (Fig.lllj) shows a very interesting behaviour. The coefficient is very high 
(over 0.8) for the data where small threads dominate and consistently decreases for 
longer threads, though having positive value all the time (slightly over 0.2 at the 
lowest). The behaviour has a very convincing heuristic explanation. Longer threads 
are usually developed thanks to a multiple comment exchanges between a limited 
number of users (usually a few, but binary exchanges - just between two people - 
are also frequent). Those users post additional comments, but do not digg the story 
again - it is not allowed by the system, even if they had such an (unlikely) wish. 
Hence the discrepancy in the number of diggs and comments for long discussions. 
For short and medium-sized threads such a duality does not occur and the numbers 
are roughly the same. 




100 200 300 400 500 

threshold point: number of comments 



Fig. 11. Correlation coefficients between number of diggs Ndiggs and the number of comments 
Ncomments versus the threshold on the number of comments N com ments ■ 



6. Average response to a story 

As a development of the analysis of diggs correlations from the previous section we 
examine here the dependence of average emotional response to a story {e)thread on 
the number of diggs Ndi ggs and comments N comments the story receives. The graphs 
presented in Fig. [T2l exhibit a very interesting behaviour. They imply that there is a 
specific value at which average emotional response assumes a minimum. The point 
in question is equal to approximately Ndi ggs — 50 in the case of diggs (Fig. rT2h.) 
and approximately N comments = 20 in the case of comments (Fig. [T2"b). 

In the further part the graphs differ: (e)thread{Ndi ggs ) fluctuates around a fixed 
value (Fig. [T2"b) whereas (e) thread(N comments ) visibly decreases towards the end 
(Fig. [T2T0. The behaviour for comments is of great importance as it indicates that 
(beyond certain length) longer threads tend to be more negatively charged. Hence 
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Fig. 12. Average emotional response to a story (e)thread- (a-b) versus the number of diggs given 
to a story N^ iggs (c-d) versus the thread's length. Small squares are original data, big squares 
represent averaging with 5 bins. Plots (a,c) are close-ups of plots (b,d), respectively. 



we could assume that it is the negative comments that fuel the communication. 
Similar conclusions were reached in [2UJ where it was observed that confrontational 
or abusive discussions lasted much longer than neutral ones. It is worth to mention 
that the overall behaviour of the two graphs comply with the previous results con- 
cerning correlation between the number of diggs and the number of comments. As 
was presented in Fig. [11] for short threads the correlation is very high and substan- 
tially decreases for longer discussions. That is also the case here: the assumption of 
minimal value for short threads is observed in plots for diggs and comments alike 
while the behaviour of the graphs towards longer discussions does not match. 

In order to explain the phenomena of minimal value seen for shorter threads, 
average response histograms for the initial groups of threads were plotted in Fig. 
ITUl It can be assumed that the ratio of probability that all (or almost all) com- 
ments in a thread are negative to an analogous likelihood for a positive dominance 
is responsible for the behaviour in question. For very short threads probabilities 
for extreme values of (e) thread are roughly the same (cf equal bars in Fig. flUk): 
for threads between 10-20 a major dominance for very low negative values is ob- 
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average emotional response average emotional response average emotional response 



Fig. 13. Average emotional response histograms for threads of specified length N: (a) TV g [0, 10], 
(b) N g [11,20], (c) N g [21,20], (d) N g [31,40], (e) N g [41,50], (f) N g [51,60]. 

served culminating for threads of the length of around 20 comments. For still longer 
discussions the dominance tends to diminish as the probability of obtaining solely 
negative comments in longer threads is very low. Around the value of 50 comments 
both probabilities for extreme {e)thread values are again similar (this time both are 
very close to zero), though the graph as such is slightly shifted to the left. 

7. Influence of emotions on thread's life and end 

The issue under examination was how emotional content of comments changed with 
the development of a thread and whether or not its level at the beginning of a thread 
had any impact on how long the thread lived. 

7.1. Comment sequence 

The approach concentrated on the sequence of comments in a thread, regardless of 
how much time elapsed between publishing consecutive comments. The following 
procedure was applied: threads of the same size were grouped together and, starting 
from the 10th comment in a thread, for each point a moving average of the emotional 
content of the last 10 comments was calculated. Graphs for a few selected groups 
of threads are presented in Fig. [TJ] Figure [T4"r (and very clearly Fig. 114b) suggests 
that average emotional content both at the beginning and at the end of a thread 
increases with the increase of thread's length. The series start with negative values 
and grow towards zero with the thread length. However, Fig. [T4"b implies that this 
may be the case only for shorter threads; graphs for threads of the length of 80, 100 
and 120 comments do not show the regularity. Figure [T5l presenting the emotional 
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Fig. 14. Average emotional content in the thread (e)thmad versus the message number from the 
start. Each point is calculated as a moving average over the last 10 comments, (a) Threads of length 
20 (squares), 40 (circles), 60 (triangles) and 80 (diamonds), (b) Threads of length 25 (squares), 35 
(circles), 45 (triangles) and 55 (diamonds), (c) Threads of length 80 (squares), 100 (circles) and 
120 (triangles). 



level versus thread's length at the beginning (a) and at the end (b) of a thread 
supports the observation. The increase is present only for threads of the length 
between 20 and 60 comments, then we observe a saturation. One feasible heuristic 
explanation for this behaviour is that when the initial comments are emotionally 
more negative the thread relatively quickly wears out, because the participants give 
vent to their emotions early on in the thread and later do not have enough emotional 
potential to carry on discussion. 

Bearing in mind previous results one can also notice that the behaviour of emo- 
tional level at the beginning and at the end of a thread for the analysed in Fig. [T5l 
range of thread's lengths (20-160 comments) resembles to large extent the behaviour 
of average emotional response to stories for that very range (cf Fig. [T2l : an initial 
increase for the range between 20 and 60 is followed by saturation (or very gradual 
decrease in the case of average emotional response) for values between 60 and 160. 
This may imply that emotional content of the first 10 and the last 10 comments 
in a thread is to some degree representative of the overall emotional content of a 
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thread's length thread's length 



Fig. 15. Emotional level versus thread's length at the beginning i.e. in the first 10 comments (a) 
and at the end of the thread i.e. in the last 10 comments (b). Grey points are original data, squares 
represent averageing with 5 bins. 

thread, at least for the range of thread's lengths in question. 




absolute emotional content in the first 10 messages emotional content in the first 10 messages 



Fig. 16. Average length of a thread (L) versus the absolute emotional value in the first 10 comments 
|(e) io | (a), the emotional value in the first 10 comments (e)io- Grey points in plot (b) represent 
number threads with a specific emotional value. 

In order to more fully determine whether emotional content of the first 10 com- 
ments influences the length of the thread, averages of thread's length for different 
levels of values of initial emotions were calculated. For this analysis all threads 
having 10 or more comments were taken into account. 

Initially, an approach using the absolute value of initial emotions was applied. As 
Fig. \Wk shows, in such a case values of average thread's length vary little and one 
can't say that highly emotional launch (be it negative or positive) generally leads 
to a longer or shorter discussion. However, taking into account the whole range of 
possible values (between -1 and 1) reveals more information (Fig. 116b). 

One should start analysing Fig. 116b with an observation that, in addition to 
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average thread's length, plotted in grey are the counts of instances that fall into 
particular bins. For the two most positive values of average emotional content these 
counts are small and hence the last two points should be treated with care. Nonethe- 
less, even when the last two points are not taken into account an interesting trend 
can be noticed. Namely, for negative and slightly positive initial emotional content 
(up to the value of 0.3) one observes a stable level of average thread's length, then 
for values between 0.3 and 0.8 a drop is clearly visible. This implies that in the pop- 
ulation of all threads of the length of 10 comments and more it is the mildly and 
highly positive launches that lead to shorter discussions. The intuitive explanation 
based on everyday observation is that when people agree there is not really much 
point in carrying on the conversation. 

The result is not, however, in agreement with the one obtained earlier for the 
population of threads of the length between 20 and 60 comments (Fig. [12] smaller 
values of average initial emotional content plotted for shorter threads). This implies 
that while the regularity described in the previous paragraph holds true for the 
whole population, it may not apply to particular subsets and local patterns (even 
reverse) may occur. 

8. Conclusions 

During the period of analysis a number of possible study directions were pene- 
trated. Some provided interesting results. Below are listed conclusions deemed by 
the authors as bearing greatest significance. 

It was established that once a certain length of the thread is reached, the regu- 
larity that longer threads acquire more negative charge is valid. We can assume that 
it is the negative emotions that (starting form some point) propel the discussion in 
longer threads. However, for the thread to develop, for certain lengths it should not 
be launched with highly negative emotions. If the first comments are largely nega- 
tive the thread dies quickly. A possible explanation of this mechanism is that when 
participants give vent to their emotions early on in the thread, they later do not 
have enough emotional potential to carry on discussion. Similarly, mildly and highly 
positive launches tend to lead to shorter discussions. This suggests presence of two 
different mechanisms governing the evolution of the discussion and, consequently, 
its length. 

With the use of averages it was ascertained that the most negative emotional 
responses were prompted by websites dealing with politics. On the other hand, 
those concerned with technology, giving advice or humour generated most positive 
reactions. The most successful in terms of the number of appearance and discus- 
sions' initiation were websites that received average response of slightly and mildly 
negative values. 

Contrary to expectations, no correlation was found between the number of diggs 
received by a comment and its emotional charge. This leads to a conclusion that 
digging and burying are driven by the interest of the comment, rather than by 
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its emotional content. People vote for comments that are interesting, witty or say 
something familiar and disapprove of those which are boring, irrespective of the 
emotions they convey. 

While analysing correlations on a story level, an interesting behaviour of the 
relation between the number of diggs and the number of comments received by 
a story was found. The correlation between the two quantities is high for data 
where small threads dominate and consistently decreases for longer threads (thought 
staying positive all the time) . This behaviour has a convincing explanation. Namely, 
longer threads are formed as a result of exchanges between a limited number of 
discussion participant. No matter how many comments these users write in a specific 
thread, they digg the story only once. Hence the longer the thread, the wider the 
discrepancy between the number of diggs and the number of comments. 

Results indicate that threads with small number of diggs (corresponding to small 
number of comments) are relatively more objective. With the increase in the length 
and the number of diggs the threads' subjectivity increases. 

It is worth to highlight that a considerable number of distributions plotted for 
Digg.com followed (at least for some ranges of values) power laws. This included 
comments and diggs distributions as well as users' productivity. Thus it was con- 
firmed also is this work that scale-free relations are often found in data concerning 
online behaviour. 

Acknowledgments 

The work was supported by EU FP7 ICT Project Collective Emotions in Cyberspace 
CYBEREMOTIONS and Polish Ministry of Science Grant 1029/7.PR UE/2009/7. 
J.S. and J.A.H acknowledge the support from the European COST Action MP0801 
Physics of Competition and Conflicts and from the Polish Ministry of Science Grant 
No. 578/NCOST/2009/0. J.S. acknowledges support from a special grant of the 
Dean of the Faculty of Physics, WUT. 

References 

[1] Ball P., Critical Mass: How One Thing Leads to Another. (Farrar, Straus and Giroux; 
London, 2004). 

[2] Castellano C, Fortunato S., Loreto V, Statistical physics of social dynamics. Rev. 

Mod. Phys. 81 (2009) 591. 
[3] Axelrod R., The Dissemination of Culture: A Model with Local Convergence and 

Global Polarization. J. Conflict Resolut 41 (1997) 203. 
[4] Sznajd-Weron K., Sznajd J., Opinion Evolution in Closed Community. Int. J. Mod. 

Phys. C 11 (2000) 1157-1165. 
[5] Mitrovic M., Tadic B., Bloggers behavior and emergent communities in blog space. 

Eur. Phys. J. B 73 (2010) 293-301. 
[6] Schweitzer F., Garcia D., An Agent-Based Model of Collective Emotions in Online 

Communities. Eur. Phys. J. B 77, (2010) 533545. 
[7] Mitrovic M., Paltoglou G, Tadic B., Networks and emotion-driven user communities 

at popular blogs. Eur. Phys. J. B 77 (2010) 597609. 



I 



I 



April 17, 2012 1:49 WSPC/INSTRUCTION FILE acs'digg 



Statistical Analysis of Emotions and Opinions at Digg Website 25 

[8] Mitrovic M., Paltoglou G., Tadic B., Quantitative analysis of bloggers collective be- 
havior powered by emotions. J. Stat. Mech. (2011) P02005. 

[9] Chmiel A., Sobkowicz P., Sienkiewicz J., Paltoglou G., Buckley K., Thelwall M., 
Holyst J. A., Negative emotions boost user activity at BBC forum, Physica A 390 
(2011) 29362944. 

[10] Chmiel A., Sienkiewicz J., Paltoglou G., Buckley K., Thelwall M., Kappas A., Holyst 
J. A., Collective Emotions Online and Their Influence on Community Life, PLoS 
ONE 6(7) (2011) e0022207. 

[11] Werohski P., Sienkiewicz J., Paltoglou G., Buckley K., Thelwall K., Holyst J. A., 
Emotional Analysis of Blogs and Forums Data, e-print arXiv (2011) 1108.5974. 

[12] Garcia D., Garas A., Schweitzer F., Positive words carry less information than nega- 
tive words, e-print arXiv (2011) 1110.4123. 

[13] Czaplicka A., Chmiel A., Holyst J. A., Emotional Agents at the Square Lattice. Acta 
Phys. Pol. A 117(4) (2010) 688-694. 

[14] Chmiel A., Holyst J. A. Flow of emotional messages in artificial social networks. Int. 
J. Mod. Phys. C 21 (2010) 593602. 

[15] Rank S., Docking Agent-based Simulation of Collective Emotion to Equation-based 
Models and Interactive Agents. Proceedings of Agent-Directed Simulation Symposium, 
2010 Spring Simulation Conference. (2010) 82-89. 

[16] Gobron S., Ahn J., Paltoglou G., Thelwall M., Thalmann D., Vis. Comput. 26 (2010) 
505. 

[17] Skowron M., Pirker H., Rank S., Paltoglou G., Gobron S., in Proceedings of the 24th 
International FLAIRS Conference, (AIII Press 2011). 

[18] Skowron M., Rank S., Theunis M., Sienkiewicz J., LNCS 6974 (2011) 337. 

[19] Thelwall M., Wilkinson D., Uppal S., Data mining emotion in social network com- 
munication: Gender differences in MySpace. In Journal of the American Society for 
Information Science and Technology 61 (2010) 190-199. 

[20] Sobkowicz P., Sobkowicz A.. Dynamics of hate based Internet user networks. Eur. 
Phys. J. B, 73 (2010) 633-643. 

[21] Pang B., Lee L., Opinion mining and sentiment analysis. In Foundation and Trends 
in Information Retrieval 2 (2008) 1-135. 

[22] F. Sebastiani, Machine Learning in Automated Text Categorization. ACM Computing 
Surveys 34 (2002) 1-47. 

[23] I. Ounis, C. Macdonald, I. Soboroff, in Proceedings of the Second International Con- 
ference on Weblogs and Social Media (2008). 

[24] F. Peng, D. Schuurmans, and S. Wang, Language and task independent text catego- 
rization with simple l anguage models, in NAACL '03 (2003) 110-117. 

[25] http: / /www, digg. com\ 

[26] K. Lerman and A. Galstyan, Analysis of social voting patterns on digg. In Proceedings 
of WOSP (2008) 7-12. 

[27] Zhu Y., Measurement and analysis of an online content voting network: a case study 
of Digg. In Proceedings of the 19th international conference on WWW, (2010) 1039- 
1048. 

[28] Szabo G., Huberman B., Predicting the popularity of online content, Communications 

of the ACM 53(8) (2010) 80-88. 
[29] Lerman K., Ghosh R., Information contagion: an empirical study of the spread on 

news on Digg and Twitter social networks, in Proceedings of the Fourth International 

AAAI Conference on Weblogs and Social Media (2010) 90-97. 
[30] Rangwala H., Jamali S., Defining a coparticipation network using comments on Digg, 

IEEE Intelligent Systems 25(4) (2010) 36-44. 



I 



I 



April 17, 2012 1:49 WSPC/INSTRUCTION FILE 



acs'digg 



26 P. Pohorecki, J. Sienkiewicz, M. Mitrovic, G. Paltoglou, J. A. Holyst 

[31] Krapivsky P., Redner S., Leyvraz F., Connectivity of growing random networks, Phys. 

Rev. Lett. 85 (2000) 4629. 
[32] Barabasi A.-L., Albert R., Emergence of scaling in random networks. Science 286 

(1999) 509-512. 

[33] Krapivsky P., Redner S., Organization of growing random networks, Phys. Rev. E 63 
(2001) 066123. 



